Sparse Quadratic Discriminant Analysis For High Dimensional Data

نویسندگان

  • Jun Shao
  • Quefeng Li
چکیده

Many contemporary studies involve the classification of a subject into two classes based on n observations of the p variables associated with the subject. Under the assumption that the variables are normally distributed, the well-known linear discriminant analysis (LDA) assumes a common covariance matrix over the two classes while the quadratic discriminant analysis (QDA) allows different covariance matrices. When p is much smaller than n, even if they both diverge, the LDA and QDA have the smallest asymptotic misclassification rates for the cases of equal and unequal covariance matrices, respectively. However, modern statistical studies often face classification problems with the number of variables much larger than the sample size n, and the classical LDA and QDA can perform poorly. In fact, we give an example in which the QDA performs as poorly as random guessing even if we know the true covariances. Under some sparsity conditions on the unknown means and covariance matrices of the two classes, we propose a sparse QDA based on thresholding that has the smallest asymptotic misclassification rate conditional on the training data. We discuss an example of classifying normal and tumor colon tissues based on a set of p = 2000 genes and a sample of size n = 62, and another example of a cardiovascular study for n = 222 subjects with p = 2434 genes. A simulation is also conducted to check the performance of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Direct Approach for Sparse Quadratic Discriminant Analysis

Quadratic discriminant analysis (QDA) is a standard tool for classification due to its simplicity and flexibility. Because the number of its parameters scales quadratically with the number of the variables, QDA is not practical, however, when the dimensionality is relatively large. To address this, we propose a novel procedure named QUDA for QDA in analyzing high-dimensional data. Formulated in...

متن کامل

Innovated Interaction Screening for High-dimensional Nonlinear Classification1 By

This paper is concerned with the problems of interaction screening and nonlinear classification in a high-dimensional setting. We propose a two-step procedure, IIS-SQDA, where in the first step an innovated interaction screening (IIS) approach based on transforming the original p-dimensional feature vector is proposed, and in the second step a sparse quadratic discriminant analysis (SQDA) is pr...

متن کامل

Functional Linear Discriminant Analysis for Irregularly Sampled Curves

We introduce a technique for extending the classical method of Linear Discriminant Analysis to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis (FLDA), is particularly useful when only fragments of the curves are observed. All the techniques associated with LDA can be extended for use with FLDA. In particular ...

متن کامل

Classification of high-dimensional data for cervical cancer detection

In this paper, the performance of different generative methods for the classification of cervical nuclei are compared in order to detect cancer of cervix. These methods include classical Bayesian approaches, such as Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) or Mixture Discriminant Analysis (MDA) and a high-dimensional approach (HDDA) recently developed. The class...

متن کامل

Sparse Linear Discriminant Analysis with Applications to High Dimensional Low Sample Size Data

This paper develops a method for automatically incorporating variable selection in Fisher’s linear discriminant analysis (LDA). Utilizing the connection of Fisher’s LDA and a generalized eigenvalue problem, our approach applies the method of regularization to obtain sparse linear discriminant vectors, where “sparse” means that the discriminant vectors have only a small number of nonzero compone...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014